Automatic generation of KEGG OC (Ortholog Cluster) and its assignment to draft genomes

نویسندگان

  • Yuki Moriya
  • Toshiaki Katayama
  • Akihiro Nakaya
  • Masumi Itoh
  • Akiyasu C. Yoshizawa
  • Shujiro Okuda
  • Minoru Kanehisa
چکیده

As the number of sequenced genomes are rapidly growing, a method for automatic generation of orthologous gene clusters is needed. However, it is computationally hard to cluster a large number of genes at once. To address this problem, we have developed a heuristic method to assign gene groups from closely related organisms to an ortholog cluster in a bottom-up approach. In this method, we consider each gene subgroup as a representative gene and find their correspondence using bi-directional best hit (BBH) relations obtained from the KEGG SSDB database which stores all-vs-all Smith-Waterman similarity scores [1]. We have clustered all the genes in the KEGG GENES database [1] to generate KEGG Ortholog Clusters (OCs) which represent various aspects of the protein universe. As an application of KEGG OC, we have performed automatic gene assignment of the draft genomes which are not yet included in KEGG. Our method provides an efficient way for rapid annotation of the genes of newly sequenced organisms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

KEGG OC: a large-scale automatic construction of taxonomy-based ortholog clusters

The identification of orthologous genes in an increasing number of fully sequenced genomes is a challenging issue in recent genome science. Here we present KEGG OC (http://www.genome.jp/tools/oc/), a novel database of ortholog clusters (OCs). The current version of KEGG OC contains 1 176 030 OCs, obtained by clustering 8 357 175 genes in 2112 complete genomes (153 eukaryotes, 1830 bacteria and ...

متن کامل

Classification of Protein Sequences into Paralog and Ortholog Clusters Using Sequence Similarity Profiles of KEGG/SSDB

We are constructing KEGG/OC (Ortholog Clusters) from KEGG/SSDB (Sequence Similarity DataBase) [2]. KEGG/SSDB contains exhaustive protein sequence similarity scores of completed and nearly completed genomes calculated by the SSEARCH program [3]. KEGG/OC is constructed automatically from the graph analysis of searching cliques with an appropriate definition for the profiles of similarity scores. ...

متن کامل

KAAS: an automatic genome annotation and pathway reconstruction server

The number of complete and draft genomes is rapidly growing in recent years, and it has become increasingly important to automate the identification of functional properties and biological roles of genes in these genomes. In the KEGG database, genes in complete genomes are annotated with the KEGG orthology (KO) identifiers, or the K numbers, based on the best hit information using Smith-Waterma...

متن کامل

Biennial Report on Carcinogens Listing/Delisting Procedure

The number of complete and draft genomes is rapidly growing in recent years, and it has become increasingly important to automate the identification of functional properties and biological roles of genes in these genomes. In the KEGG database, genes in complete genomes are annotated with the KEGG orthology (KO) identifiers, or the K numbers, based on the best hit information using Smith– Waterm...

متن کامل

MBGD update 2015: microbial genome database for flexible ortholog analysis utilizing a diverse set of genomic data

The microbial genome database for comparative analysis (MBGD) (available at http://mbgd.genome.ad.jp/) is a comprehensive ortholog database for flexible comparative analysis of microbial genomes, where the users are allowed to create an ortholog table among any specified set of organisms. Because of the rapid increase in microbial genome data owing to the next-generation sequencing technology, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004